home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-11-09 | 88.4 KB | 2,355 lines |
-
-
-
-
-
-
- Network Working Group F. Yergeau
- Internet Draft G. Nicol
- <draft-ietf-html-i18n-00.txt> G. Adams
- Expires 20 February 1996 M. Duerst
- 15 August 1995
-
-
- Internationalization of the Hypertext Markup Language
-
-
- Status of this Memo
-
- This document is an Internet-Draft. Internet-Drafts are working doc-
- uments of the Internet Engineering Task Force (IETF), its areas, and
- its working groups. Note that other groups may also distribute work-
- ing documents as Internet-Drafts.
-
- Internet-Drafts are draft documents valid for a maximum of six
- months. Internet-Drafts may be updated, replaced, or obsoleted by
- other documents at any time. It is not appropriate to use Internet-
- Drafts as reference material or to cite them other than as a "working
- draft" or "work in progress".
-
- To learn the current status of any Internet-Draft, please check the
- 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
- Directories on ds.internic.net (US East Coast), nic.nordu.net
- (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
- Rim).
-
- Distribution of this document is unlimited. Please send comments to
- the HTML working group (HTML-WG) of the Internet Engineering Task
- Force (IETF) at <html-wg@oclc.org>. Discussions of the group are
- archived at URL: http://www.acl.lanl.gov/HTML_WG/archives.html.
-
-
- Abstract
-
- The Hypertext Markup Language (HTML) is a simple markup language used
- to create hypertext documents that are platform independent. Up to
- the present time, the application of HTML on the World Wide Web was
- seriously restricted by its reliance on the ISO-8859-1 coded charac-
- ter set, which is appropriate only for Western European languages.
- Despite this restriction, HTML has been widely used with other lan-
- guages, using other coded character sets or character encodings,
- through various ad hoc extensions to the language.
-
- This document is meant to address the issue of the internationaliza-
- tion of HTML by extending the specification of HTML 2.0 and giving
-
-
-
- Expires 20 February 1996 [Page 1]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- additional recommendations for proper internationalisation support.
- A foremost consideration is to make sure that HTML remains a valid
- application of SGML, while enabling its use in all languages of the
- world.
-
- The "text/html; version=2.x" Internet Media Type [RFC1590] and MIME
- Content Type [RFC1521] is defined by this specification, taken
- together with the HTML 2.0 specification [HTML-2].
-
-
- Table of contents
-
- 1. Introduction .................................................. 2
- 1.1. Scope ...................................................... 3
- 1.2. Conformance ................................................ 3
- 2. The document character set ..................................... 5
- 2.1. Reference processing model ................................. 5
- 2.2. The HTML 2.x document character set ........................ 7
- 2.3. Undisplayable characters ................................... 8
- 3. Language tags .................................................. 8
- 4. Additional entities and elements ...............................10
- 4.1. Full Latin-1 entity set ....................................10
- 4.2. Date, time, measures and monetary amounts ..................10
- 4.3. Entities and elements for language-dependent presentation ..12
- 5. Forms ..........................................................15
- 5.1. DTD additions ..............................................15
- 5.2. Form submission ............................................17
- 6. Miscellaneous ..................................................17
- 7. HTML public text ...............................................18
- 7.1. HTML DTD ...................................................18
- 7.2. SGML declaration for HTML ..................................34
- 7.3. Entity sets ................................................36
- 7.3.1. ISO Latin 1 character entity set .......................36
- 7.3.2. BIDI entity set ........................................39
- Bibliography ......................................................39
- Authors' Addresses ................................................41
-
-
- 1. Introduction
-
- The Hypertext Markup Language (HTML) is a simple markup language used
- to create hypertext documents that are platform independent. Up to
- the present time, the application of HTML on the World Wide Web was
- seriously restricted by its reliance on the ISO-8859-1 coded charac-
- ter set, which is appropriate only for Western European languages.
- Despite this restriction, HTML has been widely used with other lan-
- guages, using other coded character sets or character encodings,
- through various ad hoc extensions to the language [TAKADA].
-
-
-
- Expires 20 February 1996 [Page 2]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- This document is meant to address the issue of the internationaliza-
- tion of HTML by extending the specification of HTML 2.0 and giving
- additional recommendations for proper internationalisation support.
- It is in good part based on a paper by one of the authors on multi-
- lingualism on the WWW [NICOL]. A foremost consideration is to make
- sure that HTML remains a valid application of SGML, while enabling
- its use in all languages of the world.
-
- The specific issues addressed are the SGML document character set to
- be used for HTML, the proper treatment of the charset parameter asso-
- ciated with the "text/html" content type and the specification of
- language tags and additional entities.
-
-
- 1.1 Scope
-
- HTML has been in use by the World-Wide Web (WWW) global information
- initiative since 1990. This specification extends the capabilities
- of HTML 2.0 (RFC xxx), primarily by removing the restriction to the
- ISO-8859-1 coded character set [ISO-8859-1]. Together with the HTML
- 2.0 specification, it defines a new version of HTML to be known as
- "HTML 2.x".
-
- HTML is an application of ISO Standard 8879:1986, Information Pro-
- cessing Text and Office Systems -- Standard Generalized Markup Lan-
- guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
- formal definition of the HTML syntax in terms of SGML. This specifi-
- cation amends the DTD of HTML 2.0 in order to make it applicable to
- documents encompassing a character repertoire much larger than that
- of ISO-8859-1, while still remaining SGML conformant.
-
- Together with the HTML 2.0, specification, this specification also
- defines HTML as an Internet Media Type [RFC1590] and MIME Content
- Type [RFC1521] called "text/html", or "text/html; version=2.x". As
- such, it defines the semantics of the HTML syntax and how that syntax
- should be interpreted by user agents.
-
-
- 1.2 Conformance
-
- This specification governs the syntax of HTML documents and aspects
- of the behavior of HTML user agents.
-
- 1.2.1 Documents
-
- A document is a conforming HTML document if:
-
- * It is a conforming SGML document, and it conforms to the HTML DTD
-
-
-
- Expires 20 February 1996 [Page 3]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- (see 7.1, "HTML DTD").
-
- * It conforms to the application conventions in this specification.
- For example, the value of the HREF attribute of the <A> element
- must conform to the URI syntax.
-
- 1.2.2. User agents
-
- An HTML user agent conforms to this specification if:
-
- * It parses the characters of an HTML document into data characters
- and markup according to SGML [ISO-8879].
-
- NOTE -- In the interest of robustness and extensibility,
- there are a number of widely deployed conventions for han-
- dling non-conforming documents. See section 4.2.1 of the
- HTML 2.0 specification [HTML-2], "Undeclared Markup Error
- Handling" for details.
-
- * It supports at least the ISO-8859-1 character encoding scheme and
- processes each character in the ISO Latin Alphabet No. 1 as speci-
- fied in section 6.1 of [HTML-2].
-
- To ensure interoperability and proper support for at least
- ISO-8859-1 in an environment where character encoding schemes
- other than ISO-8859-1 are present, user agents must correctly
- interpret the charset parameter accompanying an HTML document
- received from the network.
-
- Furthermore, conforming user-agents are required to at least parse
- correctly numeric character references outside the range of
- ISO-8859-1, but within that of UCS-2.
-
- NOTE -- To support non-western writing systems, HTML user
- agents are encouraged to support `ISO-10646-UCS-2' or simi-
- lar character encoding schemes and as much of the character
- repertoire of [ISO-10646] as is practical.
-
- * It behaves identically for documents whose parsed token sequences
- are identical.
-
- For example, comments and the whitespace in tags disappear during
- tokenization, and hence they do not influence the behavior of con-
- forming user agents.
-
- * It allows the user to traverse (or at least attempt to traverse,
- resources permitting) all hyperlinks from <A> elements in an HTML
- document.
-
-
-
- Expires 20 February 1996 [Page 4]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- An HTML user agent is a level 2 user agent if, additionally:
-
- * It allows the user to express all form field values specified in
- an HTML document and to (attempt to) submit the values as requests
- to information services.
-
- 2. The document character set
-
- 2.1. Reference processing model
-
- This overview explains the reference processing model used for HTML
- 2.x, and in particular the SGML concept of a document character set.
- An actual implementation may widely differ in its internal workings
- from the model given below, but should behave as described to an out-
- side observer.
-
- Because there are various widely differing encodings of text, SGML
- does not directly address the question of how characters are encoded
- e.g. in a file. SGML views the characters as a single set (called a
- "character repertoire"), and a "code set" that assigns an integer
- number (known as "character number") to each character in the reper-
- toire. The document character set declaration defines what each of
- the character numbers represents [GOLD90, p. 451]. In most cases, an
- SGML DTD and all documents that refer to it have a single document
- character set, and all markup and data characters are part of this
- set.
-
- HTML, as an application of SGML, does not directly address the ques-
- tion of how characters are encoded as octets in external representa-
- tions such as files. This is deferred to mechanisms external to HTML,
- such as the HTTP protocol, or MIME for electronic mail.
-
- For the HTTP protocol [HTTP], the way characters are encoded is
- defined by the "charset" parameter[1] added to the "Content-Type"
- field of the header of an HTTP response. For example, to indicate
- that the transmitted document is encoded in the "JIS" encoding of
- Japanese [RFC1468], the header will contain the following line:
-
- Content-Type: text/html; charset=ISO-2022-JP
-
- _________________________
- [1] The use of the keyword "charset" in MIME suggests
- that the corresponding parameter defines a character
- set in the terms used here. This is not true, the
- "charset" parameter actually specifies an encoding,
- i.e. the mapping of one (or several) character set(s)
- to octets.
-
-
-
-
- Expires 20 February 1996 [Page 5]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- The default character set parameter in case of the HTTP protocol is
- ISO-8859-1 (the so-called "Latin-1" for Western European characters).
- The HTTP protocol also defines a mechanism for the client to define
- the character encodings it can accept. Clients and servers are
- strongly requested to use these mechanisms to assure correct trans-
- mission and interpretation of any document. Provisions that can be
- taken to help correct interpretation, even in cases where a server or
- client do not yet use these mechanisms, are described in section 6.
-
- Similarly, if HTML documents are transferred by electronic mail, the
- character encoding is defined by the "charset" parameter of the "Con-
- tent-Type" MIME header line [RFC1521].
-
- In the case any other way of transferring and storing HTML documents
- are defined or become popular, it is advised that similar provisions
- should be made to clearly identify the character encoding used and/or
- to use a single/default encoding capable of representing the widest
- range of characters used in an international context.
-
- Whatever the external character encoding actually be, it is always
- translated to a representation of the document character set speci-
- fied in Section 2.2 before processing specific to SGML/HTML. The
- reference processing model can be depicted as follows:
-
- [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
- [manager] [parser]
- ^ |
- | |
- +----------+
-
- The decoder is responsible for decoding the external representation
- of the resource to a representation using the document character set.
- The entity manager, the parser, and the application deal only with
- characters of the document character set. A display-oriented part of
- the application or the display machinery itself may again convert
- characters represented in the document character set to some other
- representation more suitable for their purpose. In any case, the
- entity manager, the parser, and the application, as far as character
- semantics are concerned, are using the HTML 2.x document character
- set only.
-
- An actual implementation may choose to translate the document into
- some encoding of the document character set as described above. How-
- ever, the behaviour described by this reference processing model can
- be achieved otherwise, in particular by using scan-suppression tech-
- niques. This subject is well out of the scope of this specification,
- however, and the reader is invited to consult the SGML standard
- [ISO-8879] or a SGML handbook [BRYAN88] [GOLD90] [VANH90] [SQ91] for
-
-
-
- Expires 20 February 1996 [Page 6]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- further information.
-
- The most important consequence of this reference processing model is
- that numeric character references are always resolved to the same
- characters, whatever the external encoding actually used. For an
- example, see Section 2.2.
-
- 2.2. The HTML 2.x document character set
-
- The document character set, in the SGML sense, of HTML 2.x is the
- Basic Multilingual Plane of ISO 10646:1993 [ISO-10646], also known as
- UCS-2. This is code-by-code identical with the Unicode standard
- [UNICODE]. The adoption of this document character set implies a
- change in the SGML declaration specified in the HTML 2.0 specifica-
- tion (section 9.5 of [HTML-2]). The change amounts to removing the
- two BASESET specifications and their accompanying DESCSET declara-
- tions, replacing them with the following declaration:
-
- BASESET "ISO Registration Number 176//CHARSET
- ISO/IEC 10646-1:1993 UCS-2 with implementation level 3
- //ESC 2/5 2/15 4/5"
- DESCSET 0 9 UNUSED
- 9 2 9
- 11 2 UNUSED
- 13 1 13
- 14 18 UNUSED
- 32 95 32
- 127 1 UNUSED
- 128 32 UNUSED
- 160 65376 160
-
- Making UCS-2 the document character set does not create non-
- conformance of any expression, construct or document that is conform-
- ing to HTML 2.0. It does make conforming certain constructs that are
- not admissible in HTML 2.0. One consequence is that data characters
- outside the repertoire of ISO-8859-1, but within that of UCS-2 become
- valid SGML characters. Another is that the upper limit of the range
- of numeric character references is extended from 255 to 65533[2] ;
- thus, И is a valid reference to a "CYRILLIC CAPITAL LETTER I".
- [ERCS] is a good source of information on Unicode and SGML, although
- its scope and technical content differ greatly from this
- _________________________
- [2] 65533 (FFFD hexadecimal) is the last valid char-
- acter in UCS-2. 65534 (FFFE hexadecimal) is unassigned
- and reserved as the byte-swapped version of ZERO WIDTH
- NON-BREAKING SPACE for byte-sex detection purposes.
- 65535 (FFFF hexadecimal) is unassigned.
-
-
-
-
- Expires 20 February 1996 [Page 7]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- specification.
-
- ISO 10646-1:1993 is the most encompassing character set currently
- existing, and there is no other character set that could take its
- place as the document character set for HTML 2.x. Also, it is
- expected that with future extensions of ISO 10646, this specification
- may also be extended. If nevertheless for a specific application
- there is a need to use characters outside this standard, this should
- be done by avoiding any conflicts with present or future versions of
- ISO 10646, i.e. by assigning these characters to a private zone.
- Also, it should be borne in mind that such a use will be highly
- unportable; in many cases, it may be better to use inline bitmaps.
-
- 2.3. Undisplayable characters
-
- With the document character set being the full ISO 10646 BMP, the
- possibility that a character cannot be displayed due to lack of
- appropriate resources (fonts) cannot be avoided. Because there are
- many different things that can be done in such a case, this document
- does not recommend any specific behaviour. Depending on the implemen-
- tation, this may also be handled by the underlaying display system
- and not the application itself. The following considerations, how-
- ever, may be of help:
-
- - A clearly visible, but unobtrusive behaviour should be preferred.
- Some documents may contain many characters that cannot be renden-
- dered, and so showing an alert for each of them is not the right
- thing to do.
-
- - In case a numeric representation of the missing character is
- given, its hexadecimal (not decimal) form is to be preferred,
- because this form is used in character set standards [ERCS].
-
- 3. Language tags
-
- Language tags can be used to control rendering of a marked up docu-
- ment in various ways: character disambiguation, in cases where the
- character encoding is not sufficient to resolve to a specific glyph;
- quotation marks; hyphenation; ligatures; spacing; voice synthesis;
- etc. Independently of rendering issues, language markup is useful as
- content markup for purposes such as classification and searching.
-
- The language attribute, LANG, takes as its value a language tag that
- identifies a natural language spoken, written, or otherwise conveyed
- by human beings for communication of information to other human
- beings. Computer languages are explicitly excluded.
-
- The syntax and registry of HTML language tags is the same as that
-
-
-
- Expires 20 February 1996 [Page 8]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
- of one or more parts: A primary language tag and a possibly empty
- series of subtags:
-
- language-tag = primary-tag *( "-" subtag )
- primary-tag = 1*8ALPHA
- subtag = 1*8ALPHA
-
- Whitespace is not allowed within the tag and all tags are case-
- insensitive. The namespace of language tags is administered by the
- IANA. Example tags include:
-
- en, en-US, en-cockney, i-cherokee, x-pig-latin
-
- Two-letter primary-tags are reserved for ISO 639 language abbrevia-
- tions [ISO-639], and three-letter primary-tags for the language
- abbreviations of ISO CD 639-2 [ISO-CD-639-2] (the latter is in addi-
- tion to the requirements of RFC 1766). Any two-letter initial subtag
- is an ISO 3166 country code [ISO-3166].
-
- In the context of HTML, a language tag is not to be interpreted as a
- single token, as per RFC 1766, but as a hierarchy. For example, a
- user agent that adjusts rendering according to language should con-
- sider that it has a match when a language tag in a style sheet entry
- matches the initial portion of the language tag of an element. An
- exact match should be preferred. This interpretation allows an ele-
- ment marked up as, for instance, "en-US" to trigger styles corre-
- sponding to, in order of preference, US-English ("en-US") or 'plain'
- or 'international' English ("en").
-
- NOTE -- using the language tag as a hierarchy does not
- imply that all languages with a common prefix will be
- understood by those fluent in one or more of those lan-
- guages; it simply allows the user to request this commonal-
- ity when it is true for that user.
-
- Since any text can logically be assigned a language, almost all HTML
- elements admit the LANG attribute. The DTD reflects this. It is
- also intended that any new element introduced in later versions of
- HTML will admit the LANG attribute, unless there is a good reason not
- to do so.
-
- For the cases where a word or phrase differs only by language from
- the surrounding text, an element is needed as a container. This ele-
- ment is called LANG, and admits the LANG attribute.
-
- The rendering of elements is meant to be controlled (in part) by the
- LANG attribute. Specific user preferences set within the browser
-
-
-
- Expires 20 February 1996 [Page 9]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- should override the value of the LANG attribute, which in turn over-
- rides the value specified by the LANG attribute of any enclosing ele-
- ment. If none of these are set, a suitable default, perhaps con-
- trolled by the user's locale, should be used to control rendering.
-
- 4. Additional entities and elements
-
- 4.1. Full Latin-1 entity set
-
- According to the suggestion of section 14 of [HTML-2], the set of
- Latin-1 entities is extended to cover the whole right part of
- ISO-8859-1. The names of the entities are taken from the appendices
- of [SGML]. A list is provided in section 7.3.1 of this specifica-
- tion.
-
- 4.2. Date, time, measures and monetary amounts
-
- One problem that faces the Web is that of data representation. Given
- the date "12/9/95", many people will think that this represents the
- 12th of September, 1995, while many others will think it represents
- December 9th. The same problem arises for many other data forms. It
- is desireable that the Web have a culture-neutral format for data, so
- that browsers can display the data in the most appropriate format for
- the end user. However, taking away all presentation choice from the
- publishers is also a bad idea, hence, some way of supplying override-
- able presentation hints is also desireable. A set of elements are
- proposed below to address the above problem.
-
-
- DATE This is used to store dates in such a way that formatting
- can be decided upon by the browser. It is desirable that
- the document author be able to provide the default format,
- with the end-user making the final decision. This format-
- ting is decided upon by the combination, of the CALENDAR
- and LANG attributes. The declaration of the DATE element
- is:
-
- <!ELEMENT DATE - O #EMPTY>
- <!ATTLIST DATE
- %attrs;
- CALENDAR CDATA #IMPLIED --specify possible values? --
- VALUE CDATA #REQUIRED
- >
-
- If the CALENDAR attribute is not specified, the Gregorian
- calendar should be assumed, in which case, the format for
- the value of the VALUE attribute should be in yyyy-mm-dd
- format, as per ISO 8601:1988 [ISO-8601].
-
-
-
- Expires 20 February 1996 [Page 10]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- TIME Like the DATE element, the TIME element is used to store
- time such that it is independent of geographical location,
- and formatting. The declaration of the TIME element is:
-
- <!ELEMENT TIME - O #EMPTY>
- <!ATTLIST TIME
- %attrs;
- ZONE CDATA #IMPLIED
- VALUE CDATA #REQUIRED
- >
-
- The contents of VALUE should be in hh:mm:ss.ss format. ZONE
- should contain a string representing the offset of the zone
- from GMT of the form "+HHMM" or "-HHMM". If omitted, Uni-
- versal Time (GMT) should be assumed. For example, <TIME
- ZONE="-0500" VALUE="11:35:04"> represents eleven hours
- thirty-five minutes and four seconds after midnight in
- Eastern North America, which is 16:35:04 GMT.
-
-
- MEASURE This element is designed to allow measurements to be marked
- up such that they can be converted between systems, and
- also to allow some formatting flexibility. The declaration
- of the MEASURE element is:
-
- <!ELEMENT MEASURE - O #EMPTY>
- <!ATTLIST MEASURE
- %attrs;
- TYPE (mass|length|area|volume|temp|dur) #REQUIRED
- UNIT CDATA #IMPLIED
- VALUE CDATA #REQUIRED
- >
-
- This is a variation of the TEI MEASURE element [TEI]. The
- TYPE attribute specifies the type of measurement being rep-
- resented. The UNIT attribute indicates the measurement unit
- type, and defaults to the applicable unit type from SI
- [ISO-1000] if not specified. The VALUE attribute specifies
- the amount of the unit. The contents of the VALUE unit
- should be parseable using the float_constant pattern from
- the following lex(1) definition:
-
- digit [0-9]
- exponent [eE][+-]?{digit}+
- i {digit}+
- float_constant[+-]?({i}|({i}.{i}?)|({i}?.{i})){exponent}?
-
-
-
-
-
- Expires 20 February 1996 [Page 11]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- MONEY This element is designed to represent monetary amounts,
- such that conversion between systems and formatting varia-
- tions are possible. The definition of this element is:
-
- <!ELEMENT MONEY - O #EMPTY>
- <!ATTLIST MONEY
- %attrs;
- UNIT CDATA #REQUIRED
- VALUE CDATA #REQUIRED
- >
-
- The UNIT attribute specifies the currency unit, using the
- abbreviations of ISO 4217 [ISO-4217]. The VALUE attribute
- contains the amount, and should follow the lexical model of
- the VALUE attribute of the MEASURE element. It is conceiv-
- able that the functionality of this element could be made
- part of MEASURE.
-
- It should be noted that there are many special cases involving the
- representation of data. For example, many people in New Zealand still
- use miles, even though New Zealand has officially adopted the metric
- system. Worse, some people use miles when they mean kilometers. This
- proposal is not aimed at handling all such cases, but rather to pre-
- sent a reasonable balance between usability, and accuracy. When for-
- mat is of the utmost importance, these tags need not be used.
-
-
- 4.3. Entities and elements for language-dependent presentation
-
- For the correct presentation of text from certain languages (irre-
- spective of formatting issues), some support in the form of addi-
- tional entities and elements is needed. In particular, bidirectional
- text (BIDI for short) requires markup in special circumstances where
- ambiguities as to the directionnality of some characters have to be
- resolved. First, a set of named character entities is added that
- allows full support of the Unicode bidirectional algorithm [UNICODE],
- plus some help with languages requiring contextual analysis for ren-
- dering:
-
- <!ENTITY zwnj SDATA "[zwnj ]"--=zero width non-joiner-->
- <!ENTITY zwj SDATA "[zwj ]"--=zero width joiner-->
- <!ENTITY lrm SDATA "[lrm ]"--=left-to-right mark-->
- <!ENTITY rlm SDATA "[rlm ]"--=right-to-left mark-->
- <!ENTITY lre SDATA "[lre ]"--=left-to-right embedding-->
- <!ENTITY rle SDATA "[rle ]"--=right-to-left embedding-->
- <!ENTITY pdf SDATA "[pdf ]"--=pop directional formatting-->
- <!ENTITY lro SDATA "[lro ]"--=left-to-right override-->
- <!ENTITY rlo SDATA "[rlo ]"--=right-to-left override-->
-
-
-
- Expires 20 February 1996 [Page 12]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- These correspond to the following characters from ISO/IEC
- 10646-1:1993 (with the equivalent numeric character reference added
- at the right):
-
- 0x200C ZERO WIDTH NON-JOINER
- 0x200D ZERO WIDTH JOINER
- 0x200E LEFT-TO-RIGHT MARK
- 0x200F RIGHT-TO-LEFT MARK
- 0x202A LEFT-TO-RIGHT EMBEDDING
- 0x202B RIGHT-TO-LEFT EMBEDDING
- 0x202C POP DIRECTIONAL FORMATTING
- 0x202D LEFT-TO-RIGHT OVERRIDE
- 0x202E RIGHT-TO-LEFT OVERRIDE
-
- These entities affect the ability to render BIDI text in a semanti-
- cally legible fashion. That is, without these special BIDI charac-
- ters, cases arise which would prevent *any* rendering whatsoever that
- reflected the basic meaning of the text. It is for this reason that
- these special characters were added to Unicode (and, thence, to
- ISO/IEC 10646). If it were possible to do reliable layout and ren-
- dering of bidirectionnal text without them, they definitely would not
- have been included in Unicode (at least not the stateful characters:
- LRE, RLE, LRO, LRO, and PDF). They are needed for the following:
-
- 1. RTL MARK, LTR MARK - used to disambiguate directionality
- of directionally neutral characters, e.g., if you have a
- double quote sitting between an Arabic and a Latin letter,
- then which direction does the quote resolve to? These
- characters are like zero width spaces which have a direc-
- tional property (but no word/line break property).
-
- 2. ZWJ, ZWNJ - used to force or block joining behavior in
- contexts which joining would occur but should not or would
- not occur but should. For example, ARABIC LETTER HEH is
- used to abbreviate "Hijri" (the Islamic calendrical sys-
- tem); however, the isolated form of HEH looks like the
- digit five as employed in Arabic script (actually based on
- Indic digits). In order to prevent one from reading HEH as
- a final digit five in a year, the initial form of HEH is
- used. However, there is no following context (i.e., a
- joining letter) to which the HEH can join. Therefore, the
- ZWJ is used to provide that context. In Farsi texts, there
- are cases where a letter that normally would join a subse-
- quent letter in a cursive connection does not. Here the
- ZWNJ is used.
-
- 3. RTL EMBEDDING, LTR EMBEDDING is used to handle nested
- directional runs such as:
-
-
-
- Expires 20 February 1996 [Page 13]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- Given the following latin/arabic letters in backing store
- with the specified embeddings:
-
- LRE L0 L1 RLE A0 A1 LRE L2 L3 PDF A2 A3 PDF L4 L5 PDF
-
- One gets the following rendering (with [] showing the
- directional transitions):
-
- [ L0 L1 [ A3 A2 [ L2 L3 ] A1 A0 ] L4 L5 ]
-
- On the other hand, without these characters, e.g., with
-
- L0 L1 A0 A1 L2 L3 A2 A3 L4 L5
-
- and a base level of LTR one gets the following rendering:
-
- [ L0 L1 [ A1 A0 ] L2 L3 [ A3 A2 ] L4 L5 ]
-
- Notice that A1,A0 is on the left and A3,A2 on the right
- unlike the above case where the embedding levels are used.
- Without the embedding characters one has at most two lev-
- els: a base directional level and a single counterflow
- directional level.
-
- A common need for the embedding characters is to handle
- text that has been pasted from one bidi context to another
- and the possibility of multiply embedding pastings.
-
- 4. LTR OVERRIDE, RTL OVERRIDE - these are needed to deal
- with unusual pieces of text in which directionality cannot
- be resolved from context in an unambiguous fashion. For
- example, in part numbers, formulas, telephone numbers, and
- other similar pieces of text, it is difficult or impossible
- to derive the directionality of numbers, punctuation, and
- other neutrals from their context.
-
- To handle the case of the directional controls appearing directly in
- the text as coded characters, a new element, entities and SHORTREFS
- are defined:
-
- <!ELEMENT BIDI - - (%text)+>
- <!ATTLIST BIDI
- %attrs;
- DIR (ltr|rtl) #IMPLIED
- FORCE (gad|dag) #IMPLIED
- >
-
- The dir attribute corresponds to the 'embedding' entities (lre and
-
-
-
- Expires 20 February 1996 [Page 14]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- rle), while the FORCE attribute corresponds to the 'override' ones
- (lro and rlo). Different allowed values of these attributes have to
- be used because of the quixotic semantics of SGML regarding tokens in
- name token groups. To support the occurrence of Unicode BIDI charac-
- ters in text (as coded characters), the following is defined:
-
- <!ENTITY lretag "<BIDI DIR=LTR>" >
- <!ENTITY rletag "<BIDI DIR=RTL>" >
- <!ENTITY lrotag "<BIDI FORCE=GAD>" >
- <!ENTITY rlotag "<BIDI FORCE=DAG>" >
- <!ENTITY pdftag "</BIDI>" >
- <!SHORTREF bidi "LRE;" lretag
- "RLE;" rletag
- "LRO;" lrotag
- "RLO;" rlotag
- "PDF;" pdftag
- >
-
- In this case LRE, RLE, LRO, RLO, and PDF have to be declared as func-
- tion names (mapped to the appropriate character numbers) in the SGML
- declaration's concrete syntax:
-
- FUNCTION
- LRE FUNCHAR 8234 -- LEFT-TO-RIGHT EMBEDDING --
- RLE FUNCHAR 8235 -- RIGHT-TO-LEFT EMBEDDING --
- PDF FUNCHAR 8236 -- POP DIRECTIONAL FORMATTING --
- LRO FUNCHAR 8237 -- LEFT-TO-RIGHT OVERRIDE --
- RLO FUNCHAR 8238 -- RIGHT-TO-LEFT OVERRIDE --
-
- The above shortrefs and <BIDI> element allow dealing with existing
- text containing bidi controls, and doing so in the framework of
- marked up text.
-
- Another additional element is important to have for proper language-
- dependent rendering. Short quotations, and in particular the quota-
- tion marks surrounding them, are typically rendered differently in
- different languages and on platforms with different graphic capabili-
- ties: "a quotation in English", `another, slightly better one', ,,a
- quotation in German", << a quotation in French >>. The <Q> element
- is introduced for that purpose.
-
- 5. Forms
-
-
- 5.1. DTD additions
-
- It is natural to expect input in any language in forms, as they pro-
- vide one of the only ways of obtaining user input. While this is
-
-
-
- Expires 20 February 1996 [Page 15]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- primarily a UI issue, there are some things that should be specified
- at the HTML level to guide behavior and promote interoperability.
-
- One is to add variants of the INPUT element corresponding to the ele-
- ments described in section 4.2, thus allowing locale-independent
- transmission of dates, times, etc. to a server. Specifically, DATE,
- TIME, MEASURE and MONEY are added as possible values of the TYPE
- attribute of the INPUT ELEMENT. Prior to transmission, the data
- should be converted to a canonical form, where possible. For example,
- if a user entered "24/12/1996" into a DATE field, it should be con-
- verted to "1996-12-24" when transmitted. Where this is not possible,
- information corresponding to the attributes of the elements defined
- need to be transmitted as well. This can be accomplished by expanding
- capabilities of the value part of the name-value pairs used to trans-
- mit forms data. The following syntax is recommended:
-
- forms-data = pair-list*
- pair-list = pair ";" pair-list | pair
- pair = name "=" value
- name = text
- value = simple-value | complex-value
- simple-value = text
- complex-value = "(" pair-list* ")"
-
- In complex-values, the name of the attribute is used as the name part
- of the name-value pair, "value" being the most common one.For exam-
- ple, a date might be transmitted as:
-
- date=(value=24/12/96;calendar=gregorian;lang=en-uk)
-
- suitably encoded.
-
- To ensure interoperability, it is necessary for the user agent (and
- the user) to have an indication of the character set(s) that the
- server providing a form will be able to handle upon submission of the
- filled-in form. Such an indication is provided by the ACCEPT-CHARSET
- attribute of the FORM element, modeled on the HTTP Accept-Charset
- header (see [HTTP]), which contains a space and/or comma delimited
- list of character sets acceptable to the server. A user agent may
- want to somehow advise the user of the contents of this attribute, or
- to restrict his possibility to enter unacceptable characters.
-
- NOTE -- The list of character sets is to be interpreted as
- an EXCLUSIVE-OR list; the server announces that it is ready
- to accept any ONE of these character encoding schemes for
- each part of a multipart entity.
-
-
-
-
-
- Expires 20 February 1996 [Page 16]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- 5.2. Form submission
-
- The HTML 2.0 form submission mechanism, based on the "application/x-
- www-form-urlencoded" media type, is hopelessly broken with regard to
- internationalization. In fact, since URLs are restricted to ASCII
- characters, the mechanism is broken even for ISO-8859-1 text. Sec-
- tion 2.2 of [RFC1738] specifies that octets may be encoded using the
- "%HH" notation, but text submitted from a form is composed of charac-
- ters, not octets. Lacking a specification of a character encoding
- scheme, the "%HH" notation has no meaning.
-
- A partial solution to this sorry state of affairs is to specify a
- default character encoding scheme to be assumed when the GET method
- of form submission is used. Specifying UCS-2 would break all exist-
- ing forms, so the only sensible way is to designate ISO-8859-1. That
- is, the encoded URL sent to submit a form by the GET method is to be
- interpreted as a sequence of single-octet characters encoded accord-
- ing to ISO-8859-1, and further encoded according to the scheme of
- [RFC1738] (the "%HH" notation). This is clearly insufficient, so the
- GET method of form submission is deprecated and should not be used in
- future documents, despite the language of section XX of [HTML-2].
-
- A better solution is to add a MIME charset parameter to the Content-
- Type header sent along with a POST method form submission, with the
- understanding that the URL encoding of [RFC1738] is applied on top of
- the specified character encoding, as a kind of implicit Content-
- Transfer-Encoding. The default ISO-8859-1 is to be implied in the
- absence of a charset parameter.
-
- The best solution is to use the "multipart/form-data" media type
- described in [FILE-UPLOAD] with the POST method of form submission.
- This mechanism encapsulates the value part of each name-value pair in
- a body-part of a multipart MIME body that is sent as the HTTP entity;
- each body part can be labeled with an appropriate Content-Type,
- including if necessary a charset parameter that specifies the charac-
- ter encoding scheme. The changes to the DTD necessary to support
- this method of form submission have been incorporated in the DTD
- included in this specification.
-
- How the user agent determines the encoding of the text entered by the
- user is outside the scope of this specification.
-
- 6. Miscellaneous
-
- Proper interpretation of a text document requires that the character
- encoding scheme be known. Current HTTP servers, however, do not gen-
- erally include an appropriate charset parameter with the Content-Type
- header, even when the encoding scheme is different from the default
-
-
-
- Expires 20 February 1996 [Page 17]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- ISO-8859-1. This is bad behaviour, and as such strongly discouraged,
- but some preventive measures can be taken to minimize the detrimental
- effects.
-
- In the case where a document is accessed from a hyperlink in an ori-
- gin HTML document, a CHARSET attribute is added to the attribute list
- of elements with link semantics (A and LINK), specifically by adding
- it to the linkExtraAttributes entity. The value of that attribute is
- to be considered a hint to the User Agent as to the character encod-
- ing scheme used by the ressource pointed to by the hyperlink; it
- should be the appropriate value of the MIME charset parameter for
- that ressource.
-
- In any document, it may be wise to include an indication of the
- encoding scheme like the following, as early as possible within the
- HEAD of the document:
-
- <META HTTP-EQUIV="Content-Type"
- CONTENT="text/html; charset=ISO-2022-JP">
-
- This is not foolproof, but will work if the encoding scheme is such
- that ASCII characters stand for themselves at least until the META
- element is parsed.
-
- For definiteness, the "charset" parameter received from the source of
- the document should be considered the most authoritative, followed in
- order of preference by the contents of a META element such as the
- above, and finally the CHARSET parameter of the anchor that was fol-
- lowed (if any).
-
- 7. HTML Public Text
-
- 7.1. HTML DTD
-
- <!-- html-2.x.dtd
-
- Document Type Definition for the HyperText Markup Language,
- version 2.x (HTML DTD)
-
- Authors: Daniel W. Connolly <connolly@w3.org>
- Franτois Yergeau <yergeau@alis.com>
- -->
-
- <!ENTITY % HTML.Version
- "-//IETF//DTD HTML 2.x//EN"
-
- -- Typical usage:
-
-
-
-
- Expires 20 February 1996 [Page 18]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.x//EN">
- <html>
- ...
- </html>
- --
- >
-
-
- <!--============ Feature Test Entities ========================-->
-
- <!ENTITY % HTML.Recommended "IGNORE"
- -- Certain features of the language are necessary for
- compatibility with widespread usage, but they may
- compromise the structural integrity of a document.
- This feature test entity enables a more prescriptive
- document type definition that eliminates
- those features.
- -->
-
- <![ %HTML.Recommended [
- <!ENTITY % HTML.Deprecated "IGNORE">
- ]]>
-
- <!ENTITY % HTML.Deprecated "INCLUDE"
- -- Certain features of the language are necessary for
- compatibility with earlier versions of the specification,
- but they tend to be used and implemented inconsistently,
- and their use is deprecated. This feature test entity
- enables a document type definition that eliminates
- these features.
- -->
-
- <!ENTITY % HTML.Highlighting "INCLUDE"
- -- Use this feature test entity to validate that a
- document uses no highlighting tags, which may be
- ignored on minimal implementations.
- -->
-
- <!ENTITY % HTML.Forms "INCLUDE"
- -- Use this feature test entity to validate that a document
- contains no forms, which may not be supported in minimal
- implementations
- -->
-
- <!ENTITY % HTML.Bidi "INCLUDE"
- -- Use this feature test entity to validate that a document
- does not use the BIDI element, entities and SHORTREFs,
- which may not be supported in some implementations
-
-
-
- Expires 20 February 1996 [Page 19]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- -->
-
- <!--============== Imported Names ==============================-->
-
- <!ENTITY % Content-Type "CDATA"
- -- meaning an internet media type
- (aka MIME content type, as per RFC1521)
- -->
-
- <!ENTITY % HTTP-Method "GET | POST"
- -- as per HTTP specification, in progress
- -->
-
- <!ENTITY % URI "CDATA"
- -- The term URI means a CDATA attribute
- whose value is a Uniform Resource Identifier.
- The syntax is defined by
-
- RFC 1808, "Relative Uniform Resource Locators."
- R. Fielding, June 1995
-
- Note that CDATA attributes are limited by the LITLEN
- capacity (1024 in the current version of html.decl),
- so that URIs in HTML have a bounded length.
-
- -->
-
-
- <!--========= DTD "Macros" =====================-->
-
- <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
-
- <!ENTITY % list " UL | OL | DIR | MENU " >
-
- <!ENTITY % attrs -- common attributes for elements --
- "lang NAME #IMPLIED -- RFC 1766 language tag --">
- <!--or CDATA?-->
-
- <!--======= Character mnemonic entities =================-->
-
- <!ENTITY % ISOlat1 PUBLIC
- "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
- %ISOlat1;
-
- <!ENTITY amp CDATA "&" -- ampersand -->
- <!ENTITY gt CDATA ">" -- greater than -->
- <!ENTITY lt CDATA "<" -- less than -->
- <!ENTITY quot CDATA """ -- double quote -->
-
-
-
- Expires 20 February 1996 [Page 20]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!--========= SGML Document Access (SDA) Parameter Entities =====-->
-
- <!-- HTML 2.0 contains SGML Document Access (SDA) fixed attributes
- in support of easy transformation to the International Committee
- for Accessible Document Design (ICADD) DTD
- "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
- ICADD applications are designed to support usable access to
- structured information by print-impaired individuals through
- Braille, large print and voice synthesis. For more information on
- SDA & ICADD:
- - ISO 12083:1993, Annex A.8, Facilities for Braille,
- large print and computer voice
- - ICADD ListServ
- <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
- - Usenet news group bit.listserv.easi
- - Recording for the Blind, +1 800 221 4792
- -->
-
- <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
- -- one to one mapping -->
- <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
- -- context-sensitive mapping -->
- <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
- -- generated text prefix -->
- <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
- -- generated text suffix -->
- <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
- -- suspend transform process -->
-
-
- <!--========= Entities for bidirectionnal text (BIDI) =========-->
-
- <![ %HTML.Bidi [
-
- <!ENTITY % HTMLbidi PUBLIC
- "-//IETF//ENTITIES bidi//EN//HTML">
- %HTMLbidi;
-
- <!-- The following, together with the BIDI element, allow dealing
- with text containing BIDI controls in the context of marked
- up text. -->
- <!ENTITY lretag "<BIDI DIR=LTR>" >
- <!ENTITY rletag "<BIDI DIR=RTL>" >
- <!ENTITY lrotag "<BIDI FORCE=GAD>" >
- <!ENTITY rlotag "<BIDI FORCE=DAG>" >
- <!ENTITY pdftag "</BIDI>" >
- <!SHORTREF bidi
- "LRE;" lretag
-
-
-
- Expires 20 February 1996 [Page 21]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- "RLE;" rletag
- "PDF;" pdftag
- "LRO;" lrotag
- "RLO;" rlotag
- >
-
- ]]>
-
- <!--========== Text Markup =====================-->
-
- <!ENTITY % loc.values "DATE | TIME | MEASURE | MONEY">
-
- <![ %HTML.Highlighting [
-
- <!ENTITY % font " TT | B | I ">
-
- <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE | Q">
-
- <![ %HTML.Bidi [
- <!ENTITY % text "#PCDATA | A | IMG | BR | %phrase | %font | LANG | BIDI | %loc.values">
- ]]>
-
- <!ENTITY % text "#PCDATA | A | IMG | BR | %phrase | %font | LANG | %loc.values">
-
- <!ELEMENT (%font;|%phrase) - - (%text)*>
- <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
- %attrs;
- %SDAFORM; "Lit"
- >
- <!ATTLIST ( B | STRONG )
- %attrs;
- %SDAFORM; "B"
- >
- <!ATTLIST ( I | EM | CITE )
- %attrs;
- %SDAFORM; "It"
- >
-
- <!-- <TT> Typewriter text -->
- <!-- <B> Bold text -->
- <!-- <I> Italic text -->
- <!-- <EM> Emphasized phrase -->
- <!-- <STRONG> Strong emphasis -->
- <!-- <CODE> Source code phrase -->
- <!-- <SAMP> Sample text or characters -->
- <!-- <KBD> Keyboard phrase, e.g. user input -->
- <!-- <VAR> Variable phrase or substituable -->
- <!-- <CITE> Name or title of cited work -->
-
-
-
- Expires 20 February 1996 [Page 22]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!ENTITY % pre.content "#PCDATA | A | HR | BR | %font | %phrase | LANG">
-
- ]]>
-
- <![ %HTML.Bidi [
-
- <!ENTITY % text "#PCDATA | A | IMG | BR | LANG | BIDI | Q | %loc.values">
-
- <!-- Should the BIDI element have an SDAFORM attr.? Which? -->
- <!ELEMENT BIDI - - (%text)+>
- <!ATTLIST BIDI
- %attrs;
- DIR (ltr|rtl) #IMPLIED
- FORCE (gad|dag) #IMPLIED
- >
-
- <!-- <BIDI> Control bidirectionnal text -->
-
- ]]>
-
- <!ENTITY % text "#PCDATA | A | IMG | BR | LANG | Q | %loc.values">
-
- <!ELEMENT BR - O EMPTY>
- <!ATTLIST BR
- %SDAPREF; "RE;"
- >
-
- <!-- <BR> Line break -->
-
- <!-- Should the LANG element have an SDAFORM attr.? Which? -->
- <!ELEMENT LANG - - (text)*>
- <!ATTLIST LANG
- %attrs;
- >
-
- <!-- <LANG> Container for language attribute -->
-
- <!ATTLIST Q
- %attrs;
- %SDAFORM; "It" -- to be verified --
- >
- <!-- <Q> Short quotation -->
-
- <!--========= Date, time, measures and monetary amounts ===========-->
-
- <!ELEMENT (%loc.values) - O EMPTY>
- <!ATTLIST DATE
- %attrs;
-
-
-
- Expires 20 February 1996 [Page 23]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- CALENDAR CDATA #IMPLIED
- VALUE CDATA #REQUIRED
- >
- <!ATTLIST TIME
- %attrs;
- ZONE CDATA #IMPLIED
- VALUE CDATA #REQUIRED
- >
- <!ATTLIST MEASURE
- %attrs;
- TYPE (weight|count|length|area|volume) #REQUIRED
- UNIT CDATA #IMPLIED
- VALUE CDATA #REQUIRED
- >
- <!ATTLIST MONEY
- %attrs;
- UNIT CDATA #REQUIRED
- VALUE CDATA #REQUIRED
- >
-
- <!-- DATE A date value -->
- <!-- TIME A time value -->
- <!-- MEASURE A measurement (length, weight, etc) -->
- <!-- MONEY A monetary amount -->
-
- <!--========= Link Markup ======================-->
-
- <!ENTITY % linkType "NAME">
-
- <!ENTITY % linkExtraAttributes
- "REL %linkType #IMPLIED
- REV %linkType #IMPLIED
- URN CDATA #IMPLIED
- TITLE CDATA #IMPLIED
- METHODS NAMES #IMPLIED
- CHARSET NAME #IMPLIED
- ">
-
- <![ %HTML.Recommended [
- <!ENTITY % A.content "(%text)*"
- -- <H1><a name="xxx">Heading</a></H1>
- is preferred to
- <a name="xxx"><H1>Heading</H1></a>
- -->
- ]]>
-
- <!ENTITY % A.content "(%heading|%text)*">
-
-
-
-
- Expires 20 February 1996 [Page 24]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!ELEMENT A - - %A.content -(A)>
- <!ATTLIST A
- %attrs;
- HREF %URI #IMPLIED
- NAME CDATA #IMPLIED
- %linkExtraAttributes;
- %SDAPREF; "<Anchor: #AttList>"
- >
- <!-- <A> Anchor; source/destination of link -->
- <!-- <A NAME="..."> Name of this anchor -->
- <!-- <A HREF="..."> Address of link destination -->
- <!-- <A URN="..."> Permanent address of destination -->
- <!-- <A REL=...> Relationship to destination -->
- <!-- <A REV=...> Relationship of destination to this -->
- <!-- <A TITLE="..."> Title of destination (advisory) -->
- <!-- <A CHARSET="..."> Charset of destination (advisory) -->
- <!-- <A METHODS="..."> Operations on destination (advisory) -->
-
-
- <!--========== Images ==========================-->
-
- <!ELEMENT IMG - O EMPTY>
- <!ATTLIST IMG
- %attrs;
- SRC %URI; #REQUIRED
- ALT CDATA #IMPLIED
- ALIGN (top|middle|bottom) #IMPLIED
- ISMAP (ISMAP) #IMPLIED
- %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
- >
-
- <!-- <IMG> Image; icon, glyph or illustration -->
- <!-- <IMG SRC="..."> Address of image object -->
- <!-- <IMG ALT="..."> Textual alternative -->
- <!-- <IMG ALIGN=...> Position relative to text -->
- <!-- <IMG ISMAP> Each pixel can be a link -->
-
- <!--========== Paragraphs=======================-->
-
- <!ELEMENT P - O (%text)*>
- <!ATTLIST P
- %attrs;
- %SDAFORM; "Para"
- >
-
- <!-- <P> Paragraph -->
-
-
-
-
-
- Expires 20 February 1996 [Page 25]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!--========== Headings, Titles, Sections ===============-->
-
- <!ELEMENT HR - O EMPTY>
- <!ATTLIST HR
- %attrs;
- %SDAPREF; "RE;RE;"
- >
-
- <!-- <HR> Horizontal rule -->
-
- <!ELEMENT ( %heading ) - - (%text;)*>
- <!ATTLIST H1
- %attrs;
- %SDAFORM; "H1"
- >
- <!ATTLIST H2
- %attrs;
- %SDAFORM; "H2"
- >
- <!ATTLIST H3
- %attrs;
- %SDAFORM; "H3"
- >
- <!ATTLIST H4
- %attrs;
- %SDAFORM; "H4"
- >
- <!ATTLIST H5
- %attrs;
- %SDAFORM; "H5"
- >
- <!ATTLIST H6
- %attrs;
- %SDAFORM; "H6"
- >
-
- <!-- <H1> Heading, level 1 -->
- <!-- <H2> Heading, level 2 -->
- <!-- <H3> Heading, level 3 -->
- <!-- <H4> Heading, level 4 -->
- <!-- <H5> Heading, level 5 -->
- <!-- <H6> Heading, level 6 -->
-
-
- <!--========== Text Flows ======================-->
-
- <![ %HTML.Forms [
- <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
-
-
-
- Expires 20 February 1996 [Page 26]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- ]]>
-
- <!ENTITY % block.forms "BLOCKQUOTE">
-
- <![ %HTML.Deprecated [
- <!ENTITY % preformatted "PRE | XMP | LISTING">
- ]]>
-
- <!ENTITY % preformatted "PRE">
-
- <!ENTITY % block "P | %list | DL
- | %preformatted
- | %block.forms">
-
- <!ENTITY % flow "(%text|%block)*">
-
- <!ENTITY % pre.content "#PCDATA | A | HR | BR | LANG">
- <!ELEMENT PRE - - (%pre.content)*>
- <!ATTLIST PRE
- %attrs;
- WIDTH NUMBER #implied
- %SDAFORM; "Lit"
- >
-
- <!-- <PRE> Preformatted text -->
- <!-- <PRE WIDTH=...> Maximum characters per line -->
-
- <![ %HTML.Deprecated [
-
- <!ENTITY % literal "CDATA"
- -- historical, non-conforming parsing mode where
- the only markup signal is the end tag
- in full
- -->
-
- <!ELEMENT (XMP|LISTING) - - %literal>
- <!ATTLIST XMP
- %attrs;
- %SDAFORM; "Lit"
- %SDAPREF; "Example:RE;"
- >
- <!ATTLIST LISTING
- %attrs;
- %SDAFORM; "Lit"
- %SDAPREF; "Listing:RE;"
- >
-
- <!-- <XMP> Example section -->
-
-
-
- Expires 20 February 1996 [Page 27]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!-- <LISTING> Computer listing -->
-
- <!ELEMENT PLAINTEXT - O %literal>
- <!-- <PLAINTEXT> Plain text passage -->
-
- <!ATTLIST PLAINTEXT
- %attrs;
- %SDAFORM; "Lit"
- >
- ]]>
-
-
- <!--========== Lists ==================-->
-
- <!ELEMENT DL - - (DT | DD)+>
- <!ATTLIST DL
- %attrs;
- COMPACT (COMPACT) #IMPLIED
- %SDAFORM; "List"
- %SDAPREF; "Definition List:"
- >
-
- <!ELEMENT DT - O (%text)*>
- <!ATTLIST DT
- %attrs;
- %SDAFORM; "Term"
- >
-
- <!ELEMENT DD - O %flow>
- <!ATTLIST DD
- %attrs;
- %SDAFORM; "LItem"
- >
-
- <!-- <DL> Definition list, or glossary -->
- <!-- <DL COMPACT> Compact style list -->
- <!-- <DT> Term in definition list -->
- <!-- <DD> Definition of term -->
-
- <!ELEMENT (OL|UL) - - (LI)+>
- <!ATTLIST OL
- %attrs;
- COMPACT (COMPACT) #IMPLIED
- %SDAFORM; "List"
- >
- <!ATTLIST UL
- %attrs;
- COMPACT (COMPACT) #IMPLIED
-
-
-
- Expires 20 February 1996 [Page 28]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- %SDAFORM; "List"
- >
- <!-- <UL> Unordered list -->
- <!-- <UL COMPACT> Compact list style -->
- <!-- <OL> Ordered, or numbered list -->
- <!-- <OL COMPACT> Compact list style -->
-
-
- <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
- <!ATTLIST DIR
- %attrs;
- COMPACT (COMPACT) #IMPLIED
- %SDAFORM; "List"
- %SDAPREF; "<LHead>Directory</LHead>"
- >
- <!ATTLIST MENU
- %attrs;
- COMPACT (COMPACT) #IMPLIED
- %SDAFORM; "List"
- %SDAPREF; "<LHead>Menu</LHead>"
- >
-
- <!-- <DIR> Directory list -->
- <!-- <DIR COMPACT> Compact list style -->
- <!-- <MENU> Menu list -->
- <!-- <MENU COMPACT> Compact list style -->
-
- <!ELEMENT LI - O %flow>
- <!ATTLIST LI
- %attrs;
- %SDAFORM; "LItem"
- >
-
- <!-- <LI> List item -->
-
- <!--========== Document Body ===================-->
-
- <![ %HTML.Recommended [
- <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
- -- <h1>Heading</h1>
- <p>Text ...
- is preferred to
- <h1>Heading</h1>
- Text ...
- -->
- ]]>
-
- <!ENTITY % body.content "(%heading | %text | %block | HR | ADDRESS)*">
-
-
-
- Expires 20 February 1996 [Page 29]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!ELEMENT BODY O O %body.content>
- <!ATTLIST BODY
- %attrs;
- >
-
- <!-- <BODY> Document body -->
-
- <!ELEMENT BLOCKQUOTE - - %body.content>
- <!ATTLIST BLOCKQUOTE
- %attrs;
- %SDAFORM; "BQ"
- >
-
- <!-- <BLOCKQUOTE> Quoted passage -->
-
- <!ELEMENT ADDRESS - - (%text|P)*>
- <!ATTLIST ADDRESS
- %attrs;
- %SDAFORM; "Lit"
- %SDAPREF; "Address:RE;"
- >
-
- <!-- <ADDRESS> Address, signature, or byline -->
-
-
- <!--======= Forms ====================-->
-
- <![ %HTML.Forms [
-
- <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
- <!ATTLIST FORM
- %attrs;
- ACTION %URI #IMPLIED
- METHOD (%HTTP-Method) GET
- ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
- ACCEPT-CHARSET CDATA #IMPLIED
- %SDAPREF; "<Para>Form:</Para>"
- %SDASUFF; "<Para>Form End.</Para>"
- >
-
- <!-- <FORM> Fill-out or data-entry form -->
- <!-- <FORM ACTION="..."> Address for completed form -->
- <!-- <FORM METHOD=...> Method of submitting form -->
- <!-- <FORM ENCTYPE="..."> Representation of form data -->
-
- <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
- RADIO | SUBMIT | RESET |
- IMAGE | HIDDEN | DATE |
-
-
-
- Expires 20 February 1996 [Page 30]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- TIME | MEASURE | MONEY |
- FILE)">
- <!ELEMENT INPUT - O EMPTY>
- <!ATTLIST INPUT
- %attrs;
- TYPE %InputType TEXT
- NAME CDATA #IMPLIED
- VALUE CDATA #IMPLIED
- SRC %URI #IMPLIED
- CHECKED (CHECKED) #IMPLIED
- SIZE CDATA #IMPLIED
- MAXLENGTH NUMBER #IMPLIED
- ALIGN (top|middle|bottom) #IMPLIED
- ACCEPT CDATA #IMPLIED --list of content types --
- %SDAPREF; "Input: "
- >
-
- <!-- <INPUT> Form input datum -->
- <!-- <INPUT TYPE=...> Type of input interaction -->
- <!-- <INPUT NAME=...> Name of form datum -->
- <!-- <INPUT VALUE="..."> Default/initial/selected value -->
- <!-- <INPUT SRC="..."> Address of image -->
- <!-- <INPUT CHECKED> Initial state is "on" -->
- <!-- <INPUT SIZE=...> Field size hint -->
- <!-- <INPUT MAXLENGTH=...> Data length maximum -->
- <!-- <INPUT ALIGN=...> Image alignment -->
-
- <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
- <!ATTLIST SELECT
- %attrs;
- NAME CDATA #REQUIRED
- SIZE NUMBER #IMPLIED
- MULTIPLE (MULTIPLE) #IMPLIED
- %SDAFORM; "List"
- %SDAPREF;
- "<LHead>Select #AttVal(Multiple)</LHead>"
- >
-
- <!-- <SELECT> Selection of option(s) -->
- <!-- <SELECT NAME=...> Name of form datum -->
- <!-- <SELECT SIZE=...> Options displayed at a time -->
- <!-- <SELECT MULTIPLE> Multiple selections allowed -->
-
- <!ELEMENT OPTION - O (#PCDATA)*>
- <!ATTLIST OPTION
- %attrs;
- SELECTED (SELECTED) #IMPLIED
- VALUE CDATA #IMPLIED
-
-
-
- Expires 20 February 1996 [Page 31]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- %SDAFORM; "LItem"
- %SDAPREF;
- "Option: #AttVal(Value) #AttVal(Selected)"
- >
-
- <!-- <OPTION> A selection option -->
- <!-- <OPTION SELECTED> Initial state -->
- <!-- <OPTION VALUE="..."> Form datum value for this option-->
-
- <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
- <!ATTLIST TEXTAREA
- %attrs;
- NAME CDATA #REQUIRED
- ROWS NUMBER #REQUIRED
- COLS NUMBER #REQUIRED
- %SDAFORM; "Para"
- %SDAPREF; "Input Text -- #AttVal(Name): "
- >
-
- <!-- <TEXTAREA> An area for text input -->
- <!-- <TEXTAREA NAME=...> Name of form datum -->
- <!-- <TEXTAREA ROWS=...> Height of area -->
- <!-- <TEXTAREA COLS=...> Width of area -->
-
- ]]>
-
-
- <!--======= Document Head ======================-->
-
- <![ %HTML.Recommended [
- <!ENTITY % head.extra "">
- ]]>
- <!ENTITY % head.extra "& NEXTID?">
-
- <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
-
- <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
-
- <!-- <HEAD> Document head -->
-
- <!ELEMENT TITLE - - (#PCDATA)*>
- <!ATTLIST TITLE
- %attrs;
- %SDAFORM; "Ti" >
-
- <!-- <TITLE> Title of document -->
-
- <!ELEMENT LINK - O EMPTY>
-
-
-
- Expires 20 February 1996 [Page 32]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!ATTLIST LINK
- %attrs;
- HREF %URI #REQUIRED
- %linkExtraAttributes;
- %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
-
- <!-- <LINK> Link from this document -->
- <!-- <LINK HREF="..."> Address of link destination -->
- <!-- <LINK URN="..."> Lasting name of destination -->
- <!-- <LINK REL=...> Relationship to destination -->
- <!-- <LINK REV=...> Relationship of destination to this -->
- <!-- <LINK TITLE="..."> Title of destination (advisory) -->
- <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
- <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
-
- <!ELEMENT ISINDEX - O EMPTY>
- <!ATTLIST ISINDEX
- %attrs;
- %SDAPREF;
- "<Para>[Document is indexed/searchable.]</Para>">
-
- <!-- <ISINDEX> Document is a searchable index -->
-
- <!ELEMENT BASE - O EMPTY>
- <!ATTLIST BASE
- HREF %URI; #REQUIRED >
-
- <!-- <BASE> Base context document -->
- <!-- <BASE HREF="..."> Address for this document -->
-
- <!ELEMENT NEXTID - O EMPTY>
- <!ATTLIST NEXTID
- N CDATA #REQUIRED >
-
- <!-- <NEXTID> Next ID to use for link name -->
- <!-- <NEXTID N=...> Next ID to use for link name -->
-
- <!ELEMENT META - O EMPTY>
- <!ATTLIST META
- HTTP-EQUIV NAME #IMPLIED
- NAME NAME #IMPLIED
- CONTENT CDATA #REQUIRED
- >
-
- <!-- <META> Generic Metainformation -->
- <!-- <META HTTP-EQUIV=...> HTTP response header name -->
- <!-- <META NAME=...> Metainformation name -->
- <!-- <META CONTENT="..."> Associated information -->
-
-
-
- Expires 20 February 1996 [Page 33]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!--======= Document Structure =================-->
-
- <![ %HTML.Deprecated [
- <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
- ]]>
- <!ENTITY % html.content "HEAD, BODY">
-
- <!ELEMENT HTML O O (%html.content)>
- <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
-
- <!ATTLIST HTML
- %attrs;
- %version.attr;
- %SDAFORM; "Book"
- >
-
- <!-- <HTML> HTML Document -->
-
-
- 7.2. SGML Declaration for HTML
-
- <!SGML "ISO 8879:1986"
- --
- SGML Declaration for HyperText Markup Language version 2.x
- (HTML 2.x).
-
- --
-
- CHARSET
- BASESET "ISO Registration Number 176//CHARSET
- ISO/IEC 10646-1:1993 UCS-2 with
- implementation level 3//ESC 2/5 2/15 4/5"
- DESCSET 0 9 UNUSED
- 9 2 9
- 11 2 UNUSED
- 13 1 13
- 14 18 UNUSED
- 32 95 32
- 127 1 UNUSED
- 128 32 UNUSED
- 160 65376 160
-
-
- CAPACITY SGMLREF
- TOTALCAP 150000
- GRPCAP 150000
- ENTCAP 150000
-
-
-
-
- Expires 20 February 1996 [Page 34]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- SCOPE DOCUMENT
- SYNTAX
- SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
- 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
- BASESET "ISO Registration Number 176//CHARSET
- ISO/IEC 10646-1:1993 UCS-2 with
- implementation level 3//ESC 2/5 2/15 4/5"
- DESCSET 0 65536 0
- FUNCTION
- RE 13
- RS 10
- SPACE 32
- TAB SEPCHAR 9
- EN-QUAD SEPCHAR 8192
- EM-QUAD SEPCHAR 8193
- EN-SPACE SEPCHAR 8194
- EM-SPACE SEPCHAR 8195
- T-P-E-SP SEPCHAR 8196
- F-P-E-SP SEPCHAR 8197
- S-P-E-SP SEPCHAR 8198
- FIG-SP SEPCHAR 8199
- PUNC-SP SEPCHAR 8200
- THIN-SP SEPCHAR 8201
- HAIR-SP SEPCHAR 8202
- Z-W-SP SEPCHAR 8203
- IDEO-SP SEPCHAR 12288
- LRE FUNCHAR 8234 -- LEFT-TO-RIGHT EMBEDDING --
- RLE FUNCHAR 8235 -- RIGHT-TO-LEFT EMBEDDING --
- PDF FUNCHAR 8236 -- POP DIRECTIONAL FORMATTING --
- LRO FUNCHAR 8237 -- LEFT-TO-RIGHT OVERRIDE --
- RLO FUNCHAR 8238 -- RIGHT-TO-LEFT OVERRIDE --
-
- NAMING LCNMSTRT ""
- UCNMSTRT ""
- LCNMCHAR ".-"
- UCNMCHAR ".-"
- NAMECASE GENERAL YES
- ENTITY NO
- DELIM GENERAL SGMLREF
- SHORTREF SGMLREF
- "" -- LEFT-TO-RIGHT EMBEDDING --
- "" -- RIGHT-TO-LEFT EMBEDDING --
- "" -- POP DIRECTIONAL FORMATTING --
- "" -- LEFT-TO-RIGHT OVERRIDE --
- "" -- RIGHT-TO-LEFT OVERRIDE --
- NAMES SGMLREF
- QUANTITY SGMLREF
- ATTSPLEN 2100
-
-
-
- Expires 20 February 1996 [Page 35]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- LITLEN 1024
- NAMELEN 72 -- somewhat arbitrary; taken from
- internet line length conventions --
- PILEN 1024
- TAGLVL 100
- TAGLEN 2100
- GRPGTCNT 150
- GRPCNT 64
-
- FEATURES
- MINIMIZE
- DATATAG NO
- OMITTAG YES
- RANK NO
- SHORTTAG YES
- LINK
- SIMPLE NO
- IMPLICIT NO
- EXPLICIT NO
- OTHER
- CONCUR NO
- SUBDOC NO
- FORMAL YES
- APPINFO "SDA" -- conforming SGML Document Access application
- --
- >
-
-
- 7.3. Entity sets
-
- 7.3.1. ISO Latin 1 Character Entity Set
-
- The following public text lists each of the characters specified in the
- Added Latin 1 entity set, along with its name, syntax for use, and
- description. This list is derived from ISO Standard 8879:1986//ENTITIES
- Added Latin 1//EN. HTML includes the entire entity set, and adds enti-
- ties for all missing characters in the right part of ISO-8859-1.
-
- <!-- (C) International Organization for Standardization 1986
- Permission to copy in any form is granted for use with
- conforming SGML systems and applications as defined in
- ISO 8879, provided this notice is included in all copies.
- -->
- <!-- Character entity set. Typical invocation:
- <!ENTITY % ISOlat1 PUBLIC
- "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
- %ISOlat1;
- -->
-
-
-
- Expires 20 February 1996 [Page 36]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!ENTITY nbsp CDATA " " -- no-break space -->
- <!ENTITY iexcl CDATA "¡" -- inverted exclamation mark -->
- <!ENTITY cent CDATA "¢" -- cent sign -->
- <!ENTITY pound CDATA "£" -- pound sterling sign -->
- <!ENTITY curren CDATA "¤" -- general currency sign -->
- <!ENTITY yen CDATA "¥" -- yen sign -->
- <!ENTITY brvbar CDATA "¦" -- broken (vertical) bar -->
- <!ENTITY sect CDATA "§" -- section sign -->
- <!ENTITY uml CDATA "¨" -- umlaut (dieresis) -->
- <!ENTITY copy CDATA "©" -- copyright sign -->
- <!ENTITY ordf CDATA "ª" -- ordinal indicator, feminine -->
- <!ENTITY laquo CDATA "«" -- angle quotation mark, left -->
- <!ENTITY not CDATA "¬" -- not sign -->
- <!ENTITY shy CDATA "" -- soft hyphen -->
- <!ENTITY reg CDATA "®" -- registered sign -->
- <!ENTITY macr CDATA "¯" -- macron -->
- <!ENTITY deg CDATA "°" -- degree sign -->
- <!ENTITY plusmn CDATA "±" -- plus-or-minus sign -->
- <!ENTITY sup2 CDATA "²" -- superscript two -->
- <!ENTITY sup3 CDATA "³" -- superscript three -->
- <!ENTITY acute CDATA "´" -- acute accent -->
- <!ENTITY micro CDATA "µ" -- micro sign -->
- <!ENTITY para CDATA "¶" -- pilcrow (paragraph sign) -->
- <!ENTITY middot CDATA "·" -- middle dot -->
- <!ENTITY cedil CDATA "¸" -- cedilla -->
- <!ENTITY sup1 CDATA "¹" -- superscript one -->
- <!ENTITY ordm CDATA "º" -- ordinal indicator, masculine -->
- <!ENTITY raquo CDATA "»" -- angle quotation mark, right -->
- <!ENTITY frac14 CDATA "¼" -- fraction one-quarter -->
- <!ENTITY frac12 CDATA "½" -- fraction one-half -->
- <!ENTITY frac34 CDATA "¾" -- fraction three-quarters -->
- <!ENTITY iquest CDATA "¿" -- inverted question mark -->
- <!ENTITY Agrave CDATA "À" -- capital A, grave accent -->
- <!ENTITY Aacute CDATA "Á" -- capital A, acute accent -->
- <!ENTITY Acirc CDATA "Â" -- capital A, circumflex accent -->
- <!ENTITY Atilde CDATA "Ã" -- capital A, tilde -->
- <!ENTITY Auml CDATA "Ä" -- capital A, dieresis or umlaut mark -->
- <!ENTITY Aring CDATA "Å" -- capital A, ring -->
- <!ENTITY AElig CDATA "Æ" -- capital AE diphthong (ligature) -->
- <!ENTITY Ccedil CDATA "Ç" -- capital C, cedilla -->
- <!ENTITY Egrave CDATA "È" -- capital E, grave accent -->
- <!ENTITY Eacute CDATA "É" -- capital E, acute accent -->
- <!ENTITY Ecirc CDATA "Ê" -- capital E, circumflex accent -->
- <!ENTITY Euml CDATA "Ë" -- capital E, dieresis or umlaut mark -->
- <!ENTITY Igrave CDATA "Ì" -- capital I, grave accent -->
- <!ENTITY Iacute CDATA "Í" -- capital I, acute accent -->
- <!ENTITY Icirc CDATA "Î" -- capital I, circumflex accent -->
- <!ENTITY Iuml CDATA "Ï" -- capital I, dieresis or umlaut mark -->
-
-
-
- Expires 20 February 1996 [Page 37]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- <!ENTITY ETH CDATA "Ð" -- capital Eth, Icelandic -->
- <!ENTITY Ntilde CDATA "Ñ" -- capital N, tilde -->
- <!ENTITY Ograve CDATA "Ò" -- capital O, grave accent -->
- <!ENTITY Oacute CDATA "Ó" -- capital O, acute accent -->
- <!ENTITY Ocirc CDATA "Ô" -- capital O, circumflex accent -->
- <!ENTITY Otilde CDATA "Õ" -- capital O, tilde -->
- <!ENTITY Ouml CDATA "Ö" -- capital O, dieresis or umlaut mark -->
- <!ENTITY times CDATA "×" -- multiply sign -->
- <!ENTITY Oslash CDATA "Ø" -- capital O, slash -->
- <!ENTITY Ugrave CDATA "Ù" -- capital U, grave accent -->
- <!ENTITY Uacute CDATA "Ú" -- capital U, acute accent -->
- <!ENTITY Ucirc CDATA "Û" -- capital U, circumflex accent -->
- <!ENTITY Uuml CDATA "Ü" -- capital U, dieresis or umlaut mark -->
- <!ENTITY Yacute CDATA "Ý" -- capital Y, acute accent -->
- <!ENTITY THORN CDATA "Þ" -- capital Thorn, Icelandic -->
- <!ENTITY szlig CDATA "ß" -- small sharp s, German (sz ligature) -->
- <!ENTITY agrave CDATA "à" -- small a, grave accent -->
- <!ENTITY aacute CDATA "á" -- small a, acute accent -->
- <!ENTITY acirc CDATA "â" -- small a, circumflex accent -->
- <!ENTITY atilde CDATA "ã" -- small a, tilde -->
- <!ENTITY auml CDATA "ä" -- small a, dieresis or umlaut mark -->
- <!ENTITY aring CDATA "å" -- small a, ring -->
- <!ENTITY aelig CDATA "æ" -- small ae diphthong (ligature) -->
- <!ENTITY ccedil CDATA "ç" -- small c, cedilla -->
- <!ENTITY egrave CDATA "è" -- small e, grave accent -->
- <!ENTITY eacute CDATA "é" -- small e, acute accent -->
- <!ENTITY ecirc CDATA "ê" -- small e, circumflex accent -->
- <!ENTITY euml CDATA "ë" -- small e, dieresis or umlaut mark -->
- <!ENTITY igrave CDATA "ì" -- small i, grave accent -->
- <!ENTITY iacute CDATA "í" -- small i, acute accent -->
- <!ENTITY icirc CDATA "î" -- small i, circumflex accent -->
- <!ENTITY iuml CDATA "ï" -- small i, dieresis or umlaut mark -->
- <!ENTITY eth CDATA "ð" -- small eth, Icelandic -->
- <!ENTITY ntilde CDATA "ñ" -- small n, tilde -->
- <!ENTITY ograve CDATA "ò" -- small o, grave accent -->
- <!ENTITY oacute CDATA "ó" -- small o, acute accent -->
- <!ENTITY ocirc CDATA "ô" -- small o, circumflex accent -->
- <!ENTITY otilde CDATA "õ" -- small o, tilde -->
- <!ENTITY ouml CDATA "ö" -- small o, dieresis or umlaut mark -->
- <!ENTITY divide CDATA "÷" -- divide sign -->
- <!ENTITY oslash CDATA "ø" -- small o, slash -->
- <!ENTITY ugrave CDATA "ù" -- small u, grave accent -->
- <!ENTITY uacute CDATA "ú" -- small u, acute accent -->
- <!ENTITY ucirc CDATA "û" -- small u, circumflex accent -->
- <!ENTITY uuml CDATA "ü" -- small u, dieresis or umlaut mark -->
- <!ENTITY yacute CDATA "ý" -- small y, acute accent -->
- <!ENTITY thorn CDATA "þ" -- small thorn, Icelandic -->
- <!ENTITY yuml CDATA "ÿ" -- small y, dieresis or umlaut mark -->
-
-
-
- Expires 20 February 1996 [Page 38]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- 7.3.2. BIDI Entity Set
-
- The following entity set is sufficient to support the full Unicode
- bidirectionnal algorithm.
-
- <!-- Character entity set. Typical invocation:
- <!ENTITY % HTMLbidi PUBLIC
- "-//IETF//ENTITIES bidi//EN//HTML">
- %HTMLbidi;
- -->
- <!ENTITY zwnj SDATA ""--=zero width non-joiner-->
- <!ENTITY zwj SDATA ""--=zero width joiner-->
- <!ENTITY lrm SDATA ""--=left-to-right mark-->
- <!ENTITY rlm SDATA ""--=right-to-left mark-->
- <!ENTITY lre SDATA ""--=left-to-right embedding-->
- <!ENTITY rle SDATA ""--=right-to-left embedding-->
- <!ENTITY pdf SDATA ""--=pop directional formatting-->
- <!ENTITY lro SDATA ""--=left-to-right override-->
- <!ENTITY rlo SDATA ""--=right-to-left override-->
-
-
- Bibliography
-
- [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
- Generalized Markup Language", Addison-Wesley, Reading,
- 1988.
-
- [ERCS] Extended Reference Concrete Syntax for SGML.
- <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
- home.html>
-
- [FILE-UPLOAD] E. Nebel and L. Masinter, "Form-based File Upload in
- HTML", Work in progress (draft-ietf-html-
- fileupload-02.txt), Xerox Corporation, April 1995.
-
- [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
- Oxford University Press, 1990.
-
- [HTML-2] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
- guage - 2.0", Work in progress (draft-ietf-html-
- spec-02.txt), MIT/W3C, May 1995.
-
- [HTTP] T. Berners-Lee, R. T. Fielding, and H. Frystyk
- Nielsen, "Hypertext Transfer Protocol - HTTP/1.0",
- Work in progress (draft-ietf-http-v10-spec-00.ps),
- MIT, UC Irvine, CERN, March 1995.
-
- [ISO-639] ISO 639:1988. Codes pour la reprΘsentation des noms de
-
-
-
- Expires 20 February 1996 [Page 39]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- langue. Technical content in
- <http://www.sil.org/sgml/iso639a.html>
-
- [ISO-CD-639-2] ISO CD 639-2:1992. Technical content in
- <http://www.sil.org/sgml/iso639-2a.html>
-
- [ISO-1000] ISO 1000:1992. UnitΘs SI et recommandations pour
- l'emploi de leurs multiples et de certaines autres
- unitΘs.
-
- [ISO-3166] ISO 3166:1993. Codes pour la reprΘsentation des noms
- de pays.
-
- [ISO-4217] ISO 4217:1990. Codes pour la reprΘsentation des mon-
- naies et types des fonds.
-
- [ISO-8601] ISO 8601:1988. ╔lΘments de donnΘes et formats
- d'Θchange -- ╔change d'information -- ReprΘsentation
- de la date et de l'heure.
-
- [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
- tion Processing -- 8-bit Single-Byte Coded Graphic
- Character Sets -- Part 1: Latin Alphabet No. 1.
-
- [ISO-8879] ISO 8879:1986. International Standard -- Information
- Processing -- Text and Office Systems -- Standard Gen-
- eralized Markup Language (SGML).
-
- [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
- mation technology -- Universal Multiple-Octet Coded
- Character Set (UCS) -- Part 1: Architecture and Basic
- Multilingual Plane.
-
- [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
- tronic Book Technologies, 1995,
- <http://www.ebt.com/docs/multling.html>
-
- [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
- Character Encoding for Internet Messages", RFC 1468,
- Keio University, Panda Programming, June 1993.
-
- [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
- net Mail Extensions) Part One: Mechanisms for Specify-
- ing and Describing the Format of Internet Message Bod-
- ies", RFC 1521, Bellcore, Innosoft, September 1993.
-
- [RFC1590] J. Postel, "Media Type Registration Procedure", RFC
- 1590, USC/ISI, March 1994.
-
-
-
- Expires 20 February 1996 [Page 40]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
- Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
- University of Minnesota, October 1994.
-
- [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
- guages", RFC 1766, UNINETT, March 1995.
-
- [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
- 1991.
-
- [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
- through the World-Wide Web", Computer Networks and
- ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
-
- [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
- change. <http://etext.virgina.edu/TEI.html>
-
- [UNICODE] The Unicode Consortium, "The Unicode Standard --
- Worldwide Character Encoding -- Version 1.0", Addison-
- Wesley, Volume 1, 1991, Volume 2, 1992. The BIDI
- algorithm is in appendix A of volume 1, with correc-
- tions in appendix D of volume 2.
-
- [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
- Publishers Group, Norwell and Dordrecht, 1990.
-
- Authors' Addresses
-
- Franτois Yergeau
- Alis Technologies
- 3410, rue Griffith
- MontrΘal QC H4T 1A7
- Canada
-
- Tel: +1 (514) 738-9171
- Fax: +1 (514) 342-0318
- EMail: yergeau@alis.ca
-
-
- Gavin Thomas Nicol
- Electronic Book Technologies, Japan
- 1-29-9 Tsurumaki,
- Setagaya-ku,
- Tokyo
- Japan
-
- Tel + Fax: +81-3-3706-7351
- EMail: gtn@ebt.com, gtn@twics.co.jp
-
-
-
- Expires 20 February 1996 [Page 41]
-
- Internet Draft HTML internationalization 15 August 1995
-
-
- Glenn Adams
- Stonehand
- 118 Magazine Street
- Cambridge, MA 02139
- U.S.A.
-
- Tel: +1 (617) 864-5524
- Fax: +1 (617) 864-4965
- EMail: glenn@stonehand.com
-
-
- Martin J. Duerst
- Multimedia-Laboratory
- Departement of Computer Science
- University of Zurich
- Winterthurerstrasse 190
- CH-8057 Zurich
- Switzerland
-
- Tel: +41 1 257 43 16
- Fax: +41 1 363 00 35
- E-mail: mduerst@ifi.unizh.ch
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Expires 20 February 1996 [Page 42]
-